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ABSTRACT 

Learning curves have proven to be a useful tool for under- 
standing how a student learns a given skill as they progress 
through a curriculum. A learning curve for a given Knowl- 
edge Component (KC) is a plot of some measure of compe- 
tence as a function of the number of opportunities the stu- 
dent has had to apply that KC. Consider the case where each 
problem-solving step is recorded by, for instance, by an in- 
telligent tutoring system. In this case, one normally assigns a 
unique KC to each problem-solving step and the construction 
of the associated learning curves is straightforward. On the 
other hand, many online homework systems only evaluate the 
student’s final answer to a problem. In that case, the student 
has generally applied a number of KCs to find the answer and 
their performance on the problem is some composite of their 
mastery of all of the requisite KCs. In this paper, we propose 
a simple method for generating learning curves for multiple- 
KC problems that is independent of any particular theory of 
learning. In the case where there is only one KC per prob- 
lem, the method reduces to the ordinary learning curves. We 
demonstrate this method using a set of artificially generated 
student data. 
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INTRODUCTION 

The increased use of online homework systems and intelli- 
gent tutor systems (ITS) means that ever-increasing amounts 
of student log data is available for analysis. This data can be 
used to answer two important questions: what skills are stu- 
dents learning and how quickly are they learning them? To be 
more precise, we can equate skills with Knowledge compo- 
nents (KCs): small bits of information needed to solve a prob- 
lem [11, 3]. KCs generally have some sort of pre-requisite 
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relations: For example, you cannot apply the area of a cir- 
cle formula A = rr? unless you first know the definition 
of “radius of a circle.’ However, aside from prerequisites, a 
KC can, by definition, be mastered independently from other 
KCs. This definition assumes that KCs are context indepen- 
dent. That is, the student’s ability to apply that KC correctly 
or quickly does not depend on the particular problem the stu- 
dent is solving or the other KCs needed to solve that problem. 


Since KCs are defined to have these properties, then it re- 
mains to be seen whether, and in what cases, they are a use- 
ful description of skill acquisition. One way to determine 
how well the KC picture is working is to examine the as- 
sociated learning curves. If the curves are smooth, increas- 
ing/decreasing monotonically (depending on the measure of 
competence), and independent of context, then the KC picture 
is working. 


Learning curves are a plot of some measure of mastery of a 
skill as a function of the number of opportunities that the stu- 
dent has had to apply that skill. Possible measures of mastery 
include: 


e number of errors made before correctly applying the KC, 
e time taken to correctly apply a KC, 


e “assistance score,’ number of errors plus number of re- 
quests for help before completing a step, and 


e “correctness”, whether the student applied the KC cor- 
rectly without any preceding errors or requests for help. 


In the following, we will use “correctness” as our measure of 
competence for a given skill. 


In a typical Intelligent Tutoring System (ITS), the student en- 
ters each problem-solving step into the tutor system. It is 
natural, in that case, to associate one KC with each student 
input and it is relatively straightforward to construct the as- 
sociated learning curves. However, many online homework 
systems only require the student to enter their final answer to 
a problems into the system. In this case, a single input is the 
entire problem and it is natural to associate multiple KCs to 
each student input. 


If multiple KCs are associated with a single input, then the 
construction of learning curves is more difficult. If the stu- 
dent gets the problem wrong, which KC is responsible? This 
is sometimes called the “assignment of blame problem” [7, 
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Table 1. List of definitions and quantities 


k,l, m: label representing a KC. 

t, u, v: label representing opportunity number for some KC. 
p: label representing an exercise. 

s: the student. 


P,,, is a model parameter representing the probability that a 
student will apply KC k correctly on opportunity ¢. P,., € 
(0, 1]. 


€s,» is the model-given probability that student s will get 
problem p correct. 


C;,% is the number of students in the dataset who correctly 
applied KC k on opportunity t. 


I(t,k) is the number of students who got a an exercise 
containing KCs k = {k,,k2,...} incorrect where t¢ = 
(ti, t2,...) isa vector of corresponding opportunities. This 
exercise represents opportunity t, for the student to apply 
KC kg. 


Ts,p 1s the set of KC, opportunity pairs such that problem p 
is opportunity ¢ for student s to apply KC k. 


6, 5]. In the following, a simple method is proposed which 
addresses the assignment of blame problem while making a 
minimum of theoretical assumptions, allowing one to con- 
struct learning curves for exercises with multiple KCs. Our 
strategy is to introduce a model where every point on each 
learning curve is identified as a model parameter. These 
model parameters, and their associated errors, are then de- 
termined by a maximum likelihood fit to student log data. In 
the case of a single KC per problem/step, this reduces to the 
usual learning curves. 


LEARNING CURVE MODEL 

A number of studies have addressed the multiple-KC problem 
in the context of some model of learning, such as Bayesian 
Knowledge Tracing or Performance Factor Analysis [2, 4]. 
In the present work, our goal is simply to construct learning 
curves using a minimum number of model assumptions. Note 
that conventional learning curves themselves make two major 
assumptions: 


1. They average over students. This corresponds to a model 
that does not have any student-specific parameters. 


2. They ignore the problem context. This corresponds to a 
model that does not have any problem-specific parameters. 


In fact, the construction of a learning curve is equivalent to 
fitting the student log data to a model containing a parameter 
representing each KC and step. In other words, if I define 
P,,, as the probability that a student will correctly apply KC 
k at opportunity ¢, and determine P; ;, by fitting to the student 
log data, then plotting of P;, versus t is a learning curve for 
KC k. 


This gives us a way forward in the multiple-KC case. We 
define a model having parameters { P; ,}. The associated log- 
likelihood is 


log (£) = De log (s,p) + S> log(1—£s») (1) 


8,pECs 8,pETs 


where s is the student, p is the problem, C, is the set of prob- 
lems s got correct, and Z, is the set of problems s got incor- 
rect. Also, €, is the model-given probability that student s 
will get problem p correct. 


We will assume that the student must apply all of the asso- 
ciated KCs to solve a given exercise correctly. This is some- 
times called a “conjunctive model” and is a good approach 
for typical K-12 math exercises [8]. This means that the total 
probability of success is the product of the KC probabilities: 


toe. VP (2) 


t,kETs,p 


where 7, ,, is the set of KCs and opportunities such that prob- 
lem p is opportunity ¢ for student s to apply KC k. 


To construct 7, ,,, one needs a list of KCs associated with each 
exercise p, sometimes referred to as the “Q-matrix” [10]. In 
this discussion, we will assume that the Q-matrix is known, 
perhaps determined by the problem author or a domain ex- 
pert. 


Numerical Calculation 

The likelihood given by Egn. (1) is rather inconvenient for 
large numerical calculations. Instead, we will introduce vari- 
ables that aggregate over student and exercise. Define Ci, 
to be the number of students in the dataset who correctly ap- 
plied KC k on opportunity ¢. Likewise, define I (t, k) to be 
the number of students who got a an exercise containing KCs 
k = {k,,ko,...} incorrect where t is a vector of associated 
opportunities. This exercise represents opportunity t, for the 
student to apply KC k,. Then, the log-likelihood can be writ- 
ten as 


log (L) = S> Cr,¢ log (Pi,n)+_ I(t, k) log (1 — T(t, k)) 


tk tk 
(3) 
where I (€, k) is the probability that a student with opportu- 
nity vector ¢ will have success on a problem containing KCs 
k = {k1,ko,...}. Following Eqn. (2), T(t, k) is a product 
over the associated probabilities: 


PG) =|] Bw (4) 


Note that the first term of Eqn. (3) has a much simpler form 
than the second term. This is due to our use of a conjunctive 
model. If a student gets an exercise “correct” then we know 
without ambiguity that they applied all of the associated KCs 
correctly. However, if they get a problem wrong, then it is not 
clear which KC is to blame and the associated probabilities 
must be considered jointly. 


Let {P,, x} be the model parameters at the maximum like- 


lihood point. {Pin} can be found numerically by maxi- 
mizing the log-likelihood, Eqn. (3) subject to the constraints 
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Table 2. KC content of the artificial homework set. Students completed 
the first eight problems in the given order and the remaining problems 
in random order; they completed between 15 and 20 problems total. 


1 2 3 4 5 6 7 8 9 10 
A A A A B B B B A _ B 


11 12 13 14 #15 #16 «17 «618 619) 20 
A B AB AB AB AB AB AB AB AB 


O < Pax < 1. For convenience, the Mathematica func- 
tion FindMaximum, was used to calculate the maximum of 
log (£). However, any optimization algorithm that enforces 
constraints and uses information about the gradient of the 
function should work as well. 


Error analysis 

It is important to calculate the standard errors associated with 
the model parameters. Unlike the single KC per problem 
case, the model parameters may be strongly correlated and 
the errors can have unexpected values. In addition, the error 
analysis can elucidate any cases where the model parameter 
cannot be determined from the data (we will discuss this fur- 
ther in the conclusion). 


Before finding the errors, we need to examine the the max- 
iumum likelihood point and identify any parameters that lie 
on the boundaries P,, x = Oor 1. The likelihood function £ 
is not stationary in these parameters at the maximum likeli- 
hood point, so the error analysis cannot be applied to them; 
they should be not be included in the Hessian matrix below, 
Eqn (5). In practice, this should not a significant issue, since 
PB, k = 0 or | typically occurs when there are just a few stu- 
dent problem-solving instances for a given ¢ and k. 


For a maximum likelihood fit, the standard errors associated 
with the model parameters can determined using the follow- 
ing procedure [1, 9]. First, we find the Hessian matrix asso- 
ciated with P, , = Pu. The matrix elements of the Hessian 
are given by 


0? log (L) - 
OP; OP ut Piss Pa lon 
1 I(t,k)T (t,k) 
= d (5) 
ParPul tk (1 ca (t, k)) Py m=Po.m 


To find the standard error associated with each of the model 
parameters P;,, we invert the negative of the Hessian ma- 
trix and take the square root of the diagonal elements. If this 
process fails (the Hessian matrix is singular), it is a signal 
that some of the model parameters cannot be uniquely de- 
termined from the given log data. Similarly, if the Hessian 
matrix is nearly singular, then the associated standard errors 
will be very large. This will single out any model parameters 
that cannot be determined from the data. 
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Figure 1. Learning curve for the artificial homework set where we as- 
sume each problem has the same single KC. Note the jump after oppor- 
tunity 4 due to the fact that the first four and second four problems have 
different KCs. 
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Figure 2. Learning curve for KC A. The solid line is the model used to 
generate the student data and the points with error bars represent the 
learning curve determined from the student data using our procedure. 
Note that the error bars for the last few opportunities are larger, due to 
student attrition. 


APPLICATION TO STUDENT DATA 

To illustrate how this model works, we will generate an ar- 
tificial student performance dataset. Consider a homework 
assignment of 20 problems that exercise two KCs, A and 
B as detailed in Table 2. We assume that students progress 
through the first 8 problems in the given order, but solve the 
remaining 12 problems in random order, completing between 
15 and 20 problems. We assume that student mastery for the 
KCs is given by the functions P;,4 = 0.9 — 0.85e~°-3* and 
Pip = 0.85 — 0.45e~°:"; see Figures 2 and 3. We use this 
model to generate a set of outcomes, C,, Zs, and 7;,», for 100 
students. 


If we ignore the KC content of the problems, we can plot a 
naive learning curve for this student data; See Fig. 1. We 
see a discontinuity at t = 4 due to the change in actual KC 
content of the problems. The last problems are more difficult, 
since they involve two skills and so the student performance 
on them is suppressed. 


Next, we use our procedure to generate learning curves and 
associated errors for this dataset. The results are plotted in 
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Figure 3. Learning curve for KC B. The high value at t = 5 is a sta- 
tistical fluctuation: as we iincrease the number of students, the model 4 
parameters will converge to the solid line. 


Figs. 2 and 3. As expected, they agree well with the model 

used to generate the student data. This shows that our method 

is working. Note that the error bars can vary considerably 5 
from point to point. 


CONCLUSION 

The primary goal of the approach developed here is to plot 
learning curves for cases where there are problems (or prob- 
lem steps) involving multiple KCs. In practice, we find our 
method to be numerically robust (no problems with local 
maxima). 


However, there is one case where it may fail: if there is a 
KC that always appears along with another KC for several 
problems and all the students in the dataset solve nearly the 
same ordered sequence of problems, then there is no way dis- 
tinguish between the two KCs for one or more value of tf. 
This will result in a Hessian matrix that is not positive-definite 
and the matrix inversion will fail. We believe that this situa- 
tion will rarely arise in practice, since most datasets involve 
students in multiple courses, and students are generally not 


forced to solve problems in a specific order. 8. 


In this work, we focused on a “conjunctive model” for com- 
bining KCs, as this is likely the most appropriate model for 
typical math and science exercises. Although the basic strat- 
egy we present here could be applied to other models (dis- 
junctive, compensatory) for combining KCs, the details of the 
associated numerical calculation would look rather different. 


Obviously, the next step is to apply this approach to real stu- 
dent data. This would require a set of exercises that have 
been tagged with multiple KCs, where the mix of KCs vary 


significantly from exercise to exercise. In addition, the stu- 10. 


dent activity would have to fairly heterogeneous, with differ- 
ent students taking different paths through the exercises. 
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